feat(edge-optimize): source CloudFront control-plane from tokowaka-client#2682
Conversation
POST /sites/:siteId/llmo/edge-optimize-bootstrap-url returns a CloudFormation quick-create URL with a server-side presigned template URL, so a customer can create the cross-account Edge Optimize connector role in their own AWS account without a public S3 bucket and without any S3 access of their own. Presigning is done with the service execution role. Includes route + capability registration, OpenAPI spec, and unit tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The getRouteHandlers "segregates static and dynamic routes" test asserts the exact set of routes; add the new dynamic route to the expected list. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Hardcode EDGE_OPTIMIZE_TEMPLATE_BUCKET and EDGE_OPTIMIZE_TRUSTED_PRINCIPAL_ARN fallbacks so the dev/ci branch deploy returns a quick-create URL before those env vars are wired into Vault/secrets. Marked TEMPORARY / TODO REMOVE — revert before merge/prod (values must come from env config). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…p-url' into feat/llmo-edge-optimize-bootstrap-url
Use llmo-edgeoptimize-cf-template (in 682033462621, where the service deploys and signs) so the dev role reads it same-account; stage customer fetches via the presigned URL. Still TEMPORARY / TODO REMOVE before merge. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…p-url' into feat/llmo-edge-optimize-bootstrap-url
…ault The TEMPORARY hardcoded EDGE_OPTIMIZE_TEMPLATE_BUCKET default makes the bucket always set, so the 'not configured' guard can no longer be hit via an empty env. Exercise the same guard via the missing S3 client instead. TODO: restore the empty-bucket variant when the temp default is removed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Tighten the default lifetime of the bootstrap template presigned URL from 1h to 15m. The customer opens the quick-create link immediately, so a shorter TTL shrinks the leak window. A leaked URL only grants GetObject on the single template object until expiry; still override via EDGE_OPTIMIZE_PRESIGN_TTL. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…(Phase 2)
Backend for the CloudFront 'Deploy routing' wizard's read steps. The
api-service assumes the customer's cross-account connector role server-side
(no AWS creds in the browser):
- New src/support/edge-optimize.js: assumeConnectorRole (STS AssumeRole with
the per-session external ID) + listCloudFrontDistributions.
- POST /sites/:siteId/llmo/edge-optimize/connect - verifies the role is
assumable (returns { connected } so the UI can poll while the customer
creates the role); POST .../edge-optimize/distributions - lists the
account's CloudFront distributions. Both gated by site access + LLMO admin
and added to INTERNAL_ROUTES (not exposed to S2S).
- Adds @aws-sdk/client-sts and @aws-sdk/client-cloudfront.
- Unit tests for the support module (mocked SDK) and both handlers; route +
capability lists updated.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Document the already-shipped connect and distributions endpoints plus the new read-only prerequisites, origins, and behaviors endpoints for the CloudFront "Deploy routing" wizard. Adds shared connector/distribution request schemas and per-endpoint response definitions. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ints (Phase 2) Add three read-only CloudFront wizard endpoints, mirroring the existing connect/distributions handlers (12-digit accountId + externalId validation, site/access/LLMO-admin gate, assumed-role calls, badRequest on failure): - POST /sites/:siteId/llmo/edge-optimize/prerequisites -> checkEdgeOptimizePrerequisites reports connectorRole + cloudFrontRead checks (ok/false + detail, never 500) - POST /sites/:siteId/llmo/edge-optimize/origins -> getEdgeOptimizeOrigins returns origins + hasEdgeOptimizeOrigin detection - POST /sites/:siteId/llmo/edge-optimize/behaviors -> getEdgeOptimizeBehaviors returns default + ordered cache behaviors Adds getDistributionConfig() support fn (GetDistributionConfigCommand) and unit tests for the support fn, controller handlers, and route registration. All three routes added to INTERNAL_ROUTES (admin/IMS-only, not S2S). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sync per-step endpoints that assume the customer connector role server-side and perform one CloudFront write each (no AWS creds in the browser): - create-origin: add the EdgeOptimize_Origin (env EDGE_OPTIMIZE_ORIGIN_DOMAIN, default dev.edgeoptimize.net) via UpdateDistribution (ETag IfMatch). - create-function: create/update + publish the edgeoptimize-routing CF Function (bot-routing JS ported from the standalone wizard). - apply-cache: add EO headers to the behavior's custom cache policy (common path; legacy ForwardedValues / managed-policy clone left as TODO). - create-lambda: create exec role (bounded IAM-propagation retry) + the edgeoptimize-origin Lambda@Edge and publish a version. - apply-associations: wire the function (viewer-request) + Lambda (origin- request/response) onto the selected behavior. - verify: server-side bot-vs-human probe; passed requires x-edgeoptimize-request-id (x-edgeoptimize-fo = failover, not success). Adds @aws-sdk/client-iam + @aws-sdk/client-lambda. All AWS ops use ETag read-modify-write. Embedded function/Lambda code ported verbatim from the connect-aws-wizard; Lambda code inlined per the helix-deploy bundling rule. Mocked-SDK unit tests for all 6 support fns + handlers; routes/capabilities + OpenAPI updated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nt-wizard' into feat/llmo-edge-optimize-cloudfront-wizard
The Default(*) behavior commonly uses an AWS-managed cache policy, which cannot be updated (UpdateCachePolicy -> 'update is not allowed for this policy'). applyEdgeOptimizeCacheHeaders now ports the full standalone-wizard logic with all three scenarios: - legacy (ForwardedValues, no CachePolicyId): add EO headers there + MinTTL 0 - custom policy: UpdateCachePolicy to add EO headers (existing path) - managed policy: CLONE into a custom edgeoptimize-cache policy with the EO headers, then repoint the behavior to it (idempotent by name) Adds GetCachePolicy/ListCachePolicies/CreateCachePolicy. Support tests rewritten to dispatch by command name and cover all three scenarios. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…endpoint
Fixes the Lambda step's three failure modes (timeout, 'update in progress',
no existence check):
- waitForLambdaIdle now gates on State Active AND LastUpdateStatus !=
InProgress (was State only), so we never hit ResourceConflictException
('update is in progress') on a retry after a slow/timed-out first call.
- createEdgeOptimizeLambda is fully idempotent: if a published numbered
version already matches the current code, reuse it (no update/publish);
otherwise update + publish. So a retry after a CDN first-byte timeout
returns immediately instead of conflicting.
- New read-only POST /sites/:siteId/llmo/edge-optimize/lambda-status
(getEdgeOptimizeLambdaStatus) so the wizard can detect on entry and poll
whether the function already exists with a published version.
Support tests rewritten to dispatch by command name; status tests added.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…dly)
create-lambda no longer blocks on a fresh function becoming Active (which
exceeded the CDN first-byte timeout -> 503). It now ensures the role + kicks
off the function create and returns { status: 'provisioning' | 'ready' }
immediately; the UI polls until a published version exists. Also:
- buildLambdaZip uses a fixed timestamp so CodeSha256 is deterministic
(no version churn).
- lambda-status now reports roleExists + a ready flag (role is created
synchronously by the create ack) so the wizard can show role + function
state and check on entry.
- Removed the in-request waitForLambdaIdle/UpdateFunctionCode blocking path.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
edge-optimize.test.js re-ran esmock() in beforeEach, re-instantiating the mocked AWS SDK module graph on every test. As this file grew this session it accumulated enough memory to push the 12.5k-test suite past the 4GB V8 heap limit (worker OOM -> '1 failing: Worker terminated' + lost-worker coverage dropping below the 90% gate). Move esmock to a single before() hook and reset only the send stubs per test. Suite run time for this file drops from ~4min to ~7s and the leak is gone. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ze-cloudfront-wizard
…nt-wizard' into feat/llmo-edge-optimize-cloudfront-wizard
The create-origin step created the EdgeOptimize_Origin without its custom headers, so the routing function's request could not authenticate to Edge Optimize or resolve the customer host - Verify never returned an x-edgeoptimize-request-id. - createEdgeOptimizeOrigin now sets x-edgeoptimize-api-key (site EO API key), x-forwarded-host (customer host), and optional x-edgeoptimize-fetcher-key, mirroring the standalone wizard + CloudFormation installer. - Self-heals: an origin created header-less by the earlier version is patched in place on re-run (returns updated: true). - Handler derives both server-side - api key from the tokowaka metaconfig (apiKeys[0]), forwarded host from calculateForwardedHost(site.baseURL) - so no new UI input; gateEdgeOptimizeWizard now returns the site to avoid a second fetch. - Verify: documented the prod TODO (probe the customer's real domain, not the *.cloudfront.net domain) - behavior unchanged for dev testing. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ze-cloudfront-wizard
Tighten the dev-only default for EDGE_OPTIMIZE_TRUSTED_PRINCIPAL_ARN from the whole dev account (arn:aws:iam::682033462621:root) to the exact assuming identity - the spacecat-api-service Lambda execution role (arn:aws:iam::682033462621:role/spacecat-role-lambda-generic) - shrinking the blast radius of the connector-role trust. No AWS-side change needed; the assuming identity is already that role. Prod must still set this via env to the prod execution role ARN (no in-code default) - tracked in the punch list. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Thanks for the quick turnaround — but I think points 1–3 are addressing a different threat than the one I raised, so let me re-anchor. My (1) is not an unauthorized-caller / cross-account concern (I called that out as already covered by the role trust policy + LLMO authz). It's authorized-operator error: a legitimate caller pointing a real mutation at the wrong distribution. 1. 2. An FE warning isn't a server-side guardrail. This is exactly the gap, not the mitigation. The API mutates a customer's production CloudFront; a UI warning is ignorable, and it's bypassed entirely by any direct API call. The check has to live where the mutation is decided. 3. Point 3 actually confirms the hole — "a wrong distribution via direct API" is the failure mode I'm describing. And it's worse than "their own distribution": The fix is essentially free and server-side: |
…eric naming Review follow-ups (PR #2682, LLMO-5909 tracks the parked items): - Guardrail (#1): assertDistributionServesSite blocks any mutation unless the chosen distribution's CNAMEs include the site host; deploy hard-blocks, plan returns canProceed:false/blocker, granular mutators block too. Logged `allowDomainMismatch` override for edge cases. + distributionId format check. - Audit logging (#3a): structured before/after line per mutation { actor, siteId, accountId, distributionId, behavior, action, result, requestId, ts } (Splunk-queryable). - Error categorization (#3b): surface AccessDenied/PreconditionFailed/Throttling as 400 instead of a blanket 500 + "try again". - Naming aligned to the Cloudflare convention: "Edge Optimize" only on the EO origin, "CloudFront" for connector plumbing, "LLMO API key" for the credential, single [cdn-onboard-cloudfront] log tag. Parked → LLMO-5909: CloudTrail RoleSessionName=operator (after AMS validation) and a separately persisted audit record. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Update — addressed @nit23uec's review (pushed in Distribution ↔ site guardrail (must-fix #1 / Environment-agnostic / stage flow ( Actor logging + error handling (must-fix #3) — every mutation now emits a structured before/after audit line Naming ( Parked → LLMO-5909: must-fix #2 (CloudTrail |
…-automation-simplified # Conflicts: # docs/index.html
…audit logs Replace the bespoke logMutation/actorOf/reqIdOf with the same auditLine + getCallerId helpers the Cloudflare onboarding controller uses (key=value, Splunk-greppable, action/outcome/caller/requestId + fields). Emits started/done/error per mutation. Trim a few over-long comments. Adds tests to cover the guardrail on every mutator + the audit caller/requestId fields (controller statements at 100%). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…-simplified' into feat/edge-optimize-cf-automation-simplified
| "@adobe/spacecat-shared-slack-client": "1.6.7", | ||
| "@adobe/spacecat-shared-tier-client": "1.5.1", | ||
| "@adobe/spacecat-shared-tokowaka-client": "1.19.0", | ||
| "@adobe/spacecat-shared-tokowaka-client": "^1.20.0", |
There was a problem hiding this comment.
better to pin the version
There was a problem hiding this comment.
Done — pinned to 1.20.0 (dropped the caret) in e21bb091, matching the exact-pin convention of the other @adobe/spacecat-shared-* deps. Lock updated to match.
nit23uec
left a comment
There was a problem hiding this comment.
pls pin the version of tokowaka-client. otherwise, looks good.
There was a problem hiding this comment.
Hey @ABHA61,
⚠ Degraded review - no spec document was found for this change (searched the PR links, the touched repos' docs, the architecture/guidelines docs, and linked Jira). This review covers code-level quality but could not validate the change against an agreed design, so confidence is reduced. Add a spec link (PR template section 4) and re-request review for a full-confidence pass.
Verdict: Request changes - input validation gap on targetedPaths needs defense-in-depth before this hits production edge infrastructure.
Complexity: HIGH - large diff; API surface + dependency change + cross-account AWS operations.
Changes: Extracts the CloudFront "Optimize at Edge" control-plane into a dedicated controller sourced from the shared tokowaka client, adding 16 wizard endpoints (14 files).
Note: Recommend a human read before merge - this change adds a new API surface (docs/openapi/llmo-api.yaml) with cross-account AWS operations. The bot review is a complement to, not a replacement for, a human read here.
Must fix before merge
-
[Important]
targetedPathsforwarded to CF Function code-generation without element-level validation -src/controllers/llmo/llmo-cloudfront.js:363The array passes only
Array.isArray()before being forwarded tocloudFrontClient.createCloudFrontFunction()which interpolates the values into CloudFront Function source code (confirmed in the shared client:JSON.stringify(targetedPaths)into avar TARGETED_PATHS = ...assignment). WhileJSON.stringifyprevents classic injection, defense-in-depth at the API boundary should validate each element is a well-formed URL path:if (targetedPaths) { if (targetedPaths.length > 50) return badRequest("targetedPaths exceeds maximum of 50 entries"); const invalid = targetedPaths.find((p) => typeof p !== "string" || !/^\/[a-zA-Z0-9\/_\-.*]+$/.test(p) || p.length > 500); if (invalid !== undefined) return badRequest("Each targetedPaths entry must be a valid URL path"); }
This rejects non-string elements, excessively long paths, and garbage input that would waste CloudFront Function size budget (10KB max).
-
[Important]
getPermissionsinner catch returns 400 for a server-side S3 failure -src/controllers/llmo/llmo-cloudfront.js:905The inner
catch (s3Error)returnsbadRequest('CloudFront connector permissions are not available')but S3 read failure is a server-side problem. This misleads client retry logic (clients do NOT retry 400s), suppresses it from server-error alerting, and violates HTTP semantics. Every other catch in this file correctly usesinternalServerErrorfor server failures. Fix:return internalServerError('CloudFront connector permissions are not available');
Non-blocking (4): minor issues and suggestions
- suggestion: AWS error messages surfaced via
mutationErrorResponsecan contain ARNs, account IDs, and resource names - return only theerror.nameto the caller and log the full message server-side -src/controllers/llmo/llmo-cloudfront.js:229 - nit:
verifyRoutingbuilds a URL from user-controlleddomainwithout hostname format validation (low risk given LLMO admin gate, but a regex rejecting path segments and RFC-1918 ranges would close the surface cheaply) -src/controllers/llmo/llmo-cloudfront.js:843 - suggestion: Add a test case that passes an actual
targetedPathsarray tocreateRoutingFunctionand asserts it is forwarded to the stub (currently only the null case is tested) -test/controllers/llmo/llmo-cloudfront.test.js - nit: Presigned URL TTL is env-configurable with no upper bound - consider
Math.min(presignTtlSeconds, 3600)as a guardrail against misconfiguration -src/controllers/llmo/llmo-cloudfront.js:100
Previously flagged, now resolved
- Controller extracted into dedicated llmo-cloudfront.js (well-structured at 1138 lines)
- Catch blocks return internalServerError with generic messages (except getPermissions, see finding 2)
- lambdaVersionArn validated with regex + account-ID cross-check
- Routes namespaced under /sites/{siteId}/llmo/cdn-onboard/cloudfront/*
- Credential checks centralized in validateCloudfrontCredentials with distributionId regex
- Gist dependency replaced with proper npm package
Skill: pr-review | Model: us.anthropic.claude-opus-4-6-v1[1m] | Duration: 1m 54s | Cost: $8.40 | Commit: ff73936a1d15804250346637fd608b13e4182973
If this code review was useful, please react with 👍. Otherwise, react with 👎.
Match the exact-pin convention of the other @adobe/spacecat-shared-* deps (drop the caret), per review. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
targetedPaths is an optional allowlist (route only these paths instead of all HTML pages), used mainly for testing. Cap the list (max 20 entries) and check each entry is a clean, bounded path. The value is JSON-embedded into the CloudFront Function (not interpolated), so this is a sanity/size guard, not an injection fix — a small list keeps the function well under the 10 KB code limit with headroom for the base function to grow. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Re: must-fix #1 ( On the count: I used 20 rather than 50 because |
The inner catch in getPermissions returned badRequest for server-side failures (S3 read, YAML parse, missing template metadata) — all Adobe-infra problems, not client input. That misleads client retry logic, hides the failure from 5xx alerting, and breaks HTTP semantics. Remove the redundant inner try/catch and let the single outer catch return internalServerError, matching the Cloudflare controller and every other handler in this file. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
Hey @ABHA61,
⚠ Degraded review - no spec document was found for this change (searched the PR links, the touched repos' docs, the architecture/guidelines docs, and linked Jira). This review covers code-level quality but could not validate the change against an agreed design, so confidence is reduced. Add a spec link (PR template section 4) and re-request review for a full-confidence pass.
Verdict: Approve - all prior findings addressed cleanly; incremental changes are well-tested and correctly scoped.
Complexity: HIGH - large diff; API surface + dependency change + cross-account AWS operations.
Changes: Pins tokowaka-client to exact 1.20.0, adds element-level validation on targetedPaths (count cap, length cap, charset allowlist), and fixes getPermissions to return 500 for server-side S3 failures (14 files).
Non-blocking (2): minor issues and suggestions
- nit:
getPermissionsnow returns 500 for both "S3 broke" and "template exists but has no permissions metadata" - if callers ever need to distinguish template-is-malformed from infrastructure-failure, a separate status or error code would help. Low priority given both were previously 400. -src/controllers/llmo/llmo-cloudfront.js:1104 - nit: The charset regex allows
*which has semantic meaning in CF Function routing. AtargetedPathsentry like/*would match all traffic rather than a targeted subset. Acceptable given the LLMO admin gate, but consider documenting or rejecting pure-wildcard entries if the intent is truly "targeted" paths. -src/controllers/llmo/llmo-cloudfront.js:583
Previously flagged, now resolved
- targetedPaths validated with count cap (20), length cap (256), and charset allowlist before forwarding to CF Function code-generation
- getPermissions returns internalServerError (500) for server-side S3/YAML failures instead of badRequest (400)
- tokowaka-client pinned to exact 1.20.0 (matching the convention of other @adobe/spacecat-shared-* deps)
Skill: pr-review | Model: us.anthropic.claude-opus-4-6-v1[1m] | Duration: 1m 32s | Cost: $5.81 | Commit: 5ec370587242137224ba9542b20ab67739af35d8
If this code review was useful, please react with 👍. Otherwise, react with 👎.
# [1.608.0](v1.607.0...v1.608.0) (2026-06-28) ### Features * **edge-optimize:** source CloudFront control-plane from tokowaka-client ([#2682](#2682)) ([46703d2](46703d2))
|
🎉 This PR is included in version 1.608.0 🎉 The release is available on GitHub release Your semantic-release bot 📦🚀 |
main #2682 shipped the CloudFront onboarding wizard (LlmoCloudFrontController, cdn-onboard/cloudfront/*) while this branch was open, making the branch's parallel edge-optimize/* wizard redundant. Reset onto main and rebuilt around the only net-new slice: CloudWatch log delivery. - Add enableCdnLogDelivery + rescanCdnLogDelivery on LlmoCloudFrontController under cdn-onboard/cloudfront/log-delivery and .../log-rescan - Assume-role externalId uses main's client-supplied per-session UUID (validateCloudfrontCredentials); imsOrgId for the org-scoped delivery destination is resolved server-side from the site's organization - Reuse tokowaka CloudFrontEdgeClient.listDistributions for the rescan (no local edge-optimize support module needed) - Add src/support/cdn-log-delivery.js (CloudWatch Logs delivery control plane) and @aws-sdk/client-cloudwatch-logs; OpenAPI + capability/route entries; tests Introduced by: #2680
What
Moves the CloudFront "Optimize at Edge" control-plane out of
spacecat-api-serviceand consumes it from the shared@adobe/spacecat-shared-tokowaka-clientinstead. The LLMO CloudFront controller noworchestrates onboarding through the client rather than a local copy of the AWS automation.
Why
The control-plane (assume-role + CloudFront/Lambda@Edge provisioning) is reusable infrastructure that belongs in the shared client alongside the existing CloudFront layer. Keeping it here duplicated AWS-SDK
logic in the controller and made it harder to reuse. Sourcing it from the client keeps the controller thin and gives a single home for the automation.
What it does
The
LlmoCloudFrontcontroller drives the "Deploy routing in CloudFront" wizard end to end, using the shared client:runDeployStep) and a non-mutating dry-run (planDeploy) advance the full sequence and report per-step status to the wizard.The Edge Optimize origin headers (
x-forwarded-host, API key) and the verify-probe domain are resolved server-side per target environment — the customer's production site, or their staging domain whenonboarding stage.
Changes
src/controllers/llmo/llmo-cloudfront.js+ routes under/sites/:siteId/llmo/cdn-onboard/cloudfront/*(connect, distributions, origins, behaviors, create-origin / -function / -lambda,apply-cache, associate, deploy, plan, verify, bootstrap-url, permissions).
@adobe/spacecat-shared-tokowaka-client(CloudFrontEdgeClient,assumeConnectorRole,verifyRouting); no local AWS-SDK copy in the controller.SPACECAT_CDN_CLOUDFRONT_TEMPLATE_BUCKET,SPACECAT_CDN_CLOUDFRONT_TRUSTED_PRINCIPAL_ARN,EDGE_OPTIMIZE_EDGE_DOMAIN.Tests
Controller unit tests for the CloudFront wizard (
llmo-cloudfront) and LLMO controller (llmo); lint, type-check, and build green.